Skip to content

Conversation

@abrichr
Copy link
Member

@abrichr abrichr commented Jan 17, 2026

Summary

  • Adds a new Section 11 to the publication roadmap document that provides a rigorous and honest assessment of what would be required to publish in a main track venue (NeurIPS, ICML, ICLR) rather than a workshop
  • Includes honest evaluation of why current work is workshop-level (prompt engineering, not ML research)
  • Provides four technical contribution options with effort estimates to elevate the work
  • Details additional experiments, timeline, resources, and honest recommendations

Changes

11.1 Honest Assessment: Why Current Work is Workshop-Level

  • Core contribution is prompt engineering, not ML research
  • Table of anticipated reviewer concerns with severity levels

11.2 Required Technical Contributions (Options to Elevate)

  • Option A: Learned Demo Retrieval (2-3 months, RECOMMENDED) - Train retrieval to optimize action accuracy
  • Option B: Learned Prompt Synthesis (3-4 months) - Learn optimal demo formatting/compression
  • Option C: Behavioral Cloning with Demo-Augmentation (4-6 months) - Fine-tune VLM with demo attention
  • Option D: Theoretical Analysis (2-3 months) - Information-theoretic analysis

11.3 Additional Experiments Required

  • Full WAA (50+ tasks, 3 seeds), WebArena (100+ tasks)
  • Episode success rate, multi-model comparison, ablation studies
  • Statistical significance requirements

11.4 Timeline and Resources

  • Minimum 6-7 months for main track
  • 1-2 dedicated researchers (FTE)
  • $2-5k GPU compute, $1-3k API credits

11.5 Honest Recommendation

  • Small team: Focus on workshop paper
  • Dedicated resources: Pursue Option A (Learned Retrieval)
  • Clear guidance on when NOT to attempt main track

11.6 Additional References

  • REALM, Atlas, DocPrompting (retrieval-augmented learning)
  • APE, DSPy, PromptBreeder (automatic prompt engineering)
  • CogAgent, SeeClick, RT-2 (GUI agent fine-tuning)

Test plan

  • Verify markdown renders correctly on GitHub
  • Verify Table of Contents link works
  • Review content for accuracy and completeness

Generated with Claude Code

This section provides a rigorous and honest assessment of what would be
required to elevate the current work from workshop-level to main track
publication at venues like NeurIPS, ICML, or ICLR.

Key additions:
- 11.1: Honest assessment of why current work is workshop-level (prompt
  engineering, not ML research) with table of reviewer concerns
- 11.2: Four technical contribution options to elevate the work:
  - Option A: Learned Demo Retrieval (RECOMMENDED, 2-3 months)
  - Option B: Learned Prompt Synthesis (3-4 months)
  - Option C: Behavioral Cloning with Demo-Augmentation (4-6 months)
  - Option D: Theoretical Analysis (2-3 months)
- 11.3: Additional experiments required (WAA 50+ tasks, WebArena 100+,
  multi-model, ablations, statistical significance)
- 11.4: Timeline and resource estimates (6-7 months minimum, 1-2 FTE,
  $5-10k compute/API costs)
- 11.5: Honest recommendation based on team resources
- 11.6: Additional references (REALM, Atlas, DocPrompting, APE, DSPy,
  CogAgent, SeeClick, RT-2)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@abrichr abrichr merged commit d4b7ca0 into main Jan 17, 2026
6 checks passed
@abrichr abrichr deleted the feature/publication-roadmap-main-track-path branch January 17, 2026 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants